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Abstract 

The well-known duality relating entangled states and noisy quantum channels is 
expressed in terms of a channel ket, a pure state on a suitable tripartite system, which 
functions as a pre-probability allowing the calculation of statistical correlations be- 
tween, for example, the entrance and exit of a channel, once a framework has been 
chosen so as to allow a consistent set of probabilities. In each framework the standard 
notions of ordinary (classical) information theory apply, and it makes sense to ask 
whether information of a particular sort about one system is or is not present in an- 
other system. Quantum effects arise when a single pre-probability is used to compute 
statistical correlations in different incompatible frameworks, and various constraints 
on the presence and absence of different kinds of information are expressed in a set of 
all-or-nothing theorems which generalize or give a precise meaning to the concept of 
"no-cloning." These theorems are used to discuss: the location of information in quan- 
tum channels modeled using a mixed-state environment; the CQ (classical-quantum) 
channels introduced by Holevo; and the location of information in the physical carriers 
of a quantum code. It is proposed that both channel and entanglement problems be 
classified in terms of pure states (functioning as pre-probabilities) on systems of p > 2 
parts, with mixed bipartite entanglement and simple noisy channels belonging to the 
category p = 3, a five-qubit code to the category p = 6, etc.; then by the dimen- 
sions of the Hilbert spaces of the component parts, along with other criteria yet to be 
determined. 

I Introduction 

Understanding entangled states and the properties of quantum channels are two central 
issues in quantum information theory. At least in a formal sense they are the same problem: 
the duality mapping one into the other has been discussed explicitly in recent work [TJ El El, 
and employed for various purposes in a much larger collection of papers; see [SJ 13 El M ITU] 
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for a few examples in addition to those in the extensive bibliography in |3 a . The early work 
most often cited is [H1[I2| 5 though the basic idea is not complicated, and has undoubtedly 
been rediscovered many times. Nonetheless, one has the impression that this duality has yet 
to be fully exploited, and much more could be done to relate the concepts used in discussing 
entanglement, and the large number of proposed measures of entanglement, to the ideas 
employed for thinking about quantum channels, and the definitions of many different sorts of 
channel capacity. Perhaps a barrier to its full utilization is the fact that this duality remains 
something of a mathematical abstraction whose connection with more physical ideas has 
not been totally clear. One aim of the present paper is to relate this duality to concepts of 
quantum information. To be sure, "information" as it applies to the quantum domain is not 
at present a very precise concept; the appropriate definitions remain the subject of current 
research and occasional controversy [THJ EH EE EE 113 CHI- The term is used here in the 
very broad sense of statistical correlation, an idea familiar in classical physics and classical 
information theory, which deserves to be better understood and more widely applied in the 
quantum domain. 

The duality under discussion can be formulated in various ways. One which seems par- 
ticularly helpful characterizes a noisy quantum channel using a channel ket, an entangled 
pure state on a suitable tripartite system; see Sec. Ill CI for the precise definition. While 
this idea is (at least) implicit in previous work, the main emphasis has been on the duality 
between a density operator describing a mixed state of a bipartite system and what we here 
call a dynamical operator (following [3], where the term dynamical matrix is used), closely 
connected to the superoperator describing the action of a quantum channel. The channel 
ket is obtained by "purifying" the dynamical operator using a (possibly fictitious) reference 
system; in turn, the dynamical operator is a partial trace over the projector corresponding 
to the channel ket. This relationship is well known and frequently exploited in the case of 
mixed entangled states (see, e.g., p. 110 of [E]). What is less well known is that there are 
certain advantages, both formal and conceptual, in using pure states rather than (or at least 
in addition to) mixed states when discussing the location of quantum information — see 
Sec. IIVI — and thus occasions when a channel ket provides insights not directly available 
from a dynamical operator. It should be noted that the principal role of a channel ket is 
the same as that of a dynamical operator or a density operator: it allows one to calculate 
probabilities for various properties of a quantum system. These probabilities determine the 
statistical correlations between events at different times that provide a physical description 
of a quantum channel, just as the statistical correlations between separate quantum systems 
at a given moment of time provide a physical description of entanglement. 

The remainder of this paper is structured in the following way. After introducing some 
conventions on notation in Sec. Ill At the basic map-ket duality is reviewed in Sec. Ill Bl 
our treatment differs from previous ones mainly in maintaining what we think is a helpful 
distinction between operators and their matrices. Channel kets are defined in Sec. Ill CI 
with some simple examples in Sec. Ill Dl Brief remarks on the inverse problem of turning 
entangled states into channels are found in Sec. Ill El 

Quantum information in the sense of statistical correlations is the topic of Sec. 11111 Sam- 
ple spaces and probabilities for quantum systems are discussed in Sec. IIII Al and applied 
to correlated systems in IIII Bl The notion of particular types of information about certain 
subsystems being present or absent in other subsystems, which is central to our later discus- 
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sions, is introduced in Sec. 1111 CI for entangled states, and extended to quantum channels, 
where the ideas are very similar modulo a partial transpose, in Sec. IIII Dl These definitions 
are qualitative and do not depend upon any quantitative measures of information. We be- 
lieve, however, that once correlations have been defined in a consistent manner, there is no 
barrier to using quantitative information measures, such as Shannon's mutual information; 
this should take care of the objections raised in ^3]. The point of view adopted here is 
consistent with and an extension of that in [To] . 

Following this, Sec. IIVI contains a set of "all or nothing" theorems that apply to qualitative 
aspects of information. These theorems have a number of interesting consequences, some 
of which are discussed in Sec. El where they are applied to two special types of quantum 
channels - - mixed-state environment and "CQ" channels - - and to the problem of the 
location of information in quantum codes. In Sec. IVII we propose a scheme, at present rather 
tentative, for classifying both entanglement and channel problems in terms of pure-state 
entanglement on p-part systems. 

The conclusion, Sec. IVH[ has both a summary and a list of open problems. Appendix 1X1 
contains the proofs of the theorems of Sec. IIV[ and App. |B] a particular result on bipartite 
entangled kets used in App. |XJ 



II Map-Ket Duality and Channel Kets 
II A Notation 

We shall use subscripts a, b, c, etc., and sometimes numbers, to label different subsystems 
of a system with several parts. The Hilbert space 7i a is associated with system S a , the tensor 
product 

H ab = H a ®H b (1) 

with the combined system S ab consisting of S a and <S&, and so forth. For a ket \ip) G Ti abc we 
use the notation 

V = M = |v><Vi ( 2 ) 

where the square brackets distinguish a dyad from other types of operator. Partial traces 
are denoted by 

ip ab = Tr c (», = Tr b Oafc) = Tr bc (^), (3) 

and so forth, both for dyads and other operators. Operators on the Hilbert space 7i a them- 
selves form a Hilbert space TL a} with inner product (A, A') = Tr(AU'). 

Because the subscript position is used to label the (sub) system, indices are often written 
as superscripts in circumstances in which they are not likely to be confused with exponents. 
Thus V = Hp')} denotes an orthonormal basis for the Hilbert space 7i p of dimension d p , 
with j taking values between and d p — 1. Two such bases V and V = {]$)} are called 
mutually unbiased if 

I<pV)I = iA/5, (4) 

independent of j and k. 

More generally, we shall be interested in a projective decomposition of the identity of 7i p , 
hereafter called a "decomposition", a collection {P k } of projectors summing to the identity 
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I p and mutually orthogonal to each other, 

/ p = ^P fc , P k P l = 5 kl P k . (5) 

k 

(Recall that a projector is a Hermitian operator equal to its square, so its eigenvalues are 
and 1.) No confusion arises if the same symbol V is used to denote an orthonormal basis 
(Ip 7 )} or the collection {[p 7 ]} of the corresponding projectors. 

Given an orthonormal basis {la- 5 )} of 7i a , any ket \ip) in 7i a b can be expanded in the form 

3 

where = (a^ip) is uniquely determined by \ip) and |a J ). If the are mutually 

orthogonal, we shall call © a Schmidt expansion, and sometimes write it in the alternative 
form 

w = Ev^i°'>®i 6 0> ( ? ) 

3 

with the {l^ 7 )} an orthonormal basis of Tif,, and the pj summing to 1 when \if>) is normalized, 
(^l^) = 1. By the support of an operator A we shall mean the smallest projector P such 
that 

PAP = A, (8) 

or the subspace V onto which this P projects. The rank of A is the trace of P, or the 
dimension of V, or the number of nonzero (positive) eigenvalues of A^A, or the rank of the 
matrix representing A. 



II B Maps and kets 

Given any linear map M : 7i a — > 1~Lb and an orthonormal basis A = {la- 7 )} of 7i. a , one 
can define a corresponding ket 

\j,) = y £ i \a j )®M\a j ) (9) 
i 

on the tensor product Tiab- Conversely, given such a ket, one can always expand it in the 
form (jSJ using the basis A, and define a map M by 

M|o>) = \p), (10) 

and its extension to all of TC a by linearity. These two formulas define the map-ket duality 
used throughout the rest of this paper. 

The duality depends, obviously, on the choice of orthonormal basis A; given a different 
choice A = a given map will lead to a different ket, and vice versa. For those who 

(like the author) prefer to write formulas whenever possible in basis-independent form, this 
dependence is somewhat annoying. One can get around it, as in [2JIS], by always using a single 
basis. We prefer to maintain the usual distinction between operators and matrices. The 
price for doing this is not exorbitant, because the basis dependence can always be expressed 
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in terms of a suitable unitary transformation on 7i a . And if one is primarily concerned 
with concepts which are invariant under local unitaries, meaning unitary operations which 
are tensor products of unitaries on individual subsystems, such basis dependence is not 
intolerable. 

A way of visualizing the relationship between and M, and for understanding the 
ambiguity associated with the choice of basis, is indicated by the circuit in Fig. where \4>) 
is a fully-entangled state 

" \a j ) <g> \v j ) (11) 



j 

on the system 7i a <8> Ti v , Ti v is an auxiliary Hilbert space of the same dimension of Ti a , 
and M\v j ) = as in (JTUJl . Choosing a different fully-entangled state in place of (|TT|) 
would result in a different relationship between M and this is precisely the ambiguity 
previously discussed, and provides a good way of analyzing it. 



a 
— *■ 



v 



M 



b 

— > 



Figure 1: Circuit illustrating map M - ket duality. 



II C Channel kets and superoperators 

We adopt the following by now fairly standard model for a noisy quantum channel. A 
unitary time transformation T maps the tensor product 7i ae of the Hilbert space Ti a of the 
channel entrance S a and the space Ti e of the (initial) environment S e , at some initial time 
to Hbf = Tib <8> Ti.fi corresponding to the channel exit or output Sb and environment Sf, at 
some later time, Fig. |21 Initially the environment is in a fixed pure state |e°), whereas the 
initial state of the channel is arbitrary, not fixed in advance. Because |e°) is fixed, the only 
relevant effect of the unitary operator T is that embodied in the isometry V : 7i a — > Hbf 
defined by 

V\a) =T(\a)® |e°», (12) 

and shown schematically in the second part of Fig. |21 Often 7i a and Tib are identified with 
each other, and Ti e with TCf. Maintaining the distinction both allows for the possibility, 
sometimes useful, that the dimensions of Ti a and Tib may be different, but equally important 
permits a distinct label. It is sometimes useful to assume that the environment is initially 
in a mixed, rather than a pure state, see Sec. IV Al but there is no loss in generality in 
assuming a pure state |e°), since a mixed state can always be purified by introducing an 
auxiliary system, which can then be thought of as part of S e . 

The channel ket G Ti a bf is defined as the ket dual to V in the sense of Sec. Ill Bl 

V^|*) = ^|a J ')®y|a i > eH a ®H bf , (13) 

3 
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Figure 2: Quantum channel using a unitary transformation T or isometry V. 



using an orthonormal basis A = {\a J )} of 7i a . The normalization = 1 is of no great 
importance — which is why \fd~ a is placed on the left side of this equation - - but does 
simplify certain formulas. Notice that \^) is a pure state on a tripartite system. 

The channel ket can be visualized using Fig. El the obvious analog of Fig. [TJ as obtained 
by transmitting the S v part of the fully-entangled state (jllj) through the channel, while 
preserving the S a part unchanged. It is important to distinguish the definition of the chan- 
nel ket, given in (fT3*j) . from this visualization, in that |\&) is a mathematical object which 
functions as a pre-probability, used to calculate probabilities of various events or processes 
associated with the channel, as discussed in Sec. quite apart from whether the channel 
is being used in the manner just described. 



V 



a" 

b >|#> 



Figure 3: Circuit for visualizing the channel ket |\&). 

Following the notation of Sec. lTHO the symbol ^ denotes the dyad |^ r )(^ r |, and subscripts 
are used to indicate its partial traces. Of particular importance is the dynamical operator 

R := V ab = Tr f (V) G H ab} (14) 

which corresponds to the dynamical matrix defined in |3] (apart from the order ab as against 
ba); the latter is R with a particular choice of basis. Since \1/ is a positive operator, so is R, 
and given the normalization in (j!3|) . R has unit trace. In addition, because V is an isometry, 



R a = Tr b (R) = V a = IJd a . (15) 

Thus R is a density operator for the bipartite system H a b, with the special property that R a 
is proportional to the identity. Hence whatever intuition one possesses for mixed states on 
bipartite systems can at once be applied to R; e.g., one can ask if it is separable, and if not, 
how entangled it is according to any of the numerous measures of mixed-state entanglement, 
etc. 

But in addition, R completely determines the properties of the noisy quantum channel, 
that is, the channel superoperator, up to a unitary transformation of the channel input 7i a 
corresponding to different choices for the basis used in the definition ([13)1 . The channel 
superoperator V is the map from 7i a to Hb defined by 

V(A) = Tr f (VAV j ) (16) 
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for any operator A in T-C a . To explore how V is related to R, it is helpful to choose an 
orthonormal basis {\f 1 )} for Tif, and expand \^f) as 

\*) = ^2W)®\f l ) £H ab ®H f . (17) 
i 

We shall refer to the expansion coefficients {]/«')} as Kraus kets, in that they can, using the 
duality introduced in Sec. Ill Bl be turned into maps 

\k i ) = k>® w>, (is) 

3 

where the K\ are the usual Kraus operators, labeled by subscripts as is the usual convention. 
They can be used to express the channel superoperator in the familiar form 

V{A) = Y. K i AK l ( 19 ) 
i 

The usual normalization ^ K\Ki = I a is the counterpart of (fT3j). 

The Ki no longer depend upon the arbitrary choice of basis {(a- 7 )} used in defining |^}, 
as this dependence is undone when kets are changed to maps (using the same basis) in (|18|). 
but they do depend upon the choice of basis {]/')}• One can eliminate, or (in degenerate 
cases) at least mitigate this arbitrariness by making (|T7j) a Schmidt expansion, so that the 
{!«;')} are orthogonal to one another or, equivalently, 

Tr a (K}K m ) = for I ^ m. (20) 

In that case the number of nonzero terms in 1)1 7]). what could be called the Kraus rank of 
the channel superoperator, is the rank (in the ordinary sense) of the dynamical operator 
R = V ab . 

Combining (fTTjl and (fTKjl . one obtains the expression 

R = * a6 = J><] = J-^ |a^)(a fc | ®J2 K i\ aj )( ak \ K l ( 21 ) 
l da j,k i 

for R, and from it another formula 

V(A) = Tr a [(A®I)Q], (22) 
for the channel superoperator in terms of the transition operator Q, the partial transpose 

Q = R TA = j l flfc )( ai l ® ^2 K i\^H ak \ K l e K ab (23) 

a j,k I 

of the dynamical operator with respect to the basis A = {\a^)}. Once again, by using this 
same basis a second time, its effect in defining \^f) has been undone, and Q is independent 
of the basis, consistent with the fact that the superoperator V in also does not depend 
upon the choice of basis. Despite their close relationship, Q and R are very different types 
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of operators; the latter is positive, and the former, while it is Hermitian, will typically have 
negative as well as positive eigenvalues. 

The superoperator V is a map from 7i a to Ti, b , so it can be represented as a matrix once 
orthonormal operator bases have been defined for these two spaces. There are many ways 
of choosing such bases, but one that is particularly convenient when TL a and 7i b are qubits 
is the Pauli representation using {a 3 a }, with j = the identity and j = 1,2,3 the x, y, and 
z Pauli matrices in the standard basis of TC a , and similarly {a J b }. Expanding the transition 
operator Q in the Pauli form — see the examples in Sec. Ill Dl — often provides a clearer 
notion of what a noisy channel "does" than is evident by looking at the Kraus operators. 
There are various ways of generalizing this representation to higher- dimensional spaces. For 
the case of a channel superoperator there is some advantage to using a basis of Hermitian 
operators, rather than unitaries as in [20J, because the resulting matrix is real. If the basis 
is again denoted by {a^}, with < j < d 2 — 1 for a ci-dimensional Hilbert space, one can 
again let a be the identity, so that the orthogonality condition 

Tr(aV fe ) = 5 jk d (24) 

implies that cr 7 for j > has zero trace — this makes it easy to take partial traces of 
operators written in Pauli form. 



II D Examples of one qubit channels 

We use the names for one qubit channels employed in Sec. 8.3 of ^H], but employ p in a 
way which identifies it as the probability of an error. The channel kets are sums of terms of 
the form \ab) ® |/), where a and b are either or 1, but / sometimes takes larger values. 

The bit flip channel is described by 

V2 = y/l-p (|00) + |11)) <g> |0) + -y/p (jOl) + |10)) <g> |1> (25) 

leading to a transition operator 

4Q = I + ay b + (1 - 2p) [ay b + a 3 a a 3 b ] (26) 

in the Pauli representation. The dynamical operator R is the same except for a minus sign 
multiplying the term cx^of , reflecting the fact that a y changes sign when transposed. 
For the amplitude damping channel the corresponding expressions are 

V2 |tf> = ^fl^p (|00) + |11)) <g> |0> + v^|10) ® |1), (27) 

4Q = I + pal + ^l~p (a l a al + a^) + (1 - p)^!- (28) 
A depolarizing channel requires a larger environment: 

2|tf) = v/2-3p (J00) + |11)) ®\0) + y/p (|00) - |11)) ® |1) 



+ V^p(|01)®|2) + |10)®|3)V (29) 

AQ = I + (1- 2p) (o-y + ay + a 3 y) . (30) 
Again, the dynamical operator R is obtained by changing the sign of a^a b . 
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II E From entangled states to channels 



As shown in Sec. Ill CI any noisy channel modeled as in Fig. |21 can be mapped onto an 
equivalent entangled ket | \P) on a tripartite system, and thence onto a density operator whose 
partial transpose determines the channel superoperator. Can one do the reverse, starting 
with a tripartite ket or a bipartite density operator Rl Yes, aside from the condition that 
ty a (or R a ) be proportional to the identity operator I a . But if this is not true, can one still 
turn an entanglement problem into a channel problem? There are at least two approaches, 
each with advantages and disadvantages. 

The first is to begin with a unitary operator or isometry as in Fig. EH but then instead of 
"throwing away" the environment Tif, apply a projector F to this part of the output, and 
condition on the resulting state. One can think of this as carrying out a measurement onTif 
that determines whether F is true or false, and throwing away the results of all experiments 
in which it is false. The consequence of an appropriately chosen "post selection" of this type 
will be a set of (conditional) probabilities that correspond to those of the original ket or 
density operator; in other words, one obtains the same pre-probability — see Sec. 11111 below. 
The second approach is based on Fig. El and the idea is to replace the fully-entangled \(j>) 
with a different entangled state, chosen so (fi a is no longer proportional to I a , but to *f? a (or 

Ra)- 

The question remains as to whether either of these procedures is worthwhile, and that 
depends on one's goals. Rather than turning entanglement problems into channel problems, 
it may be simpler to do the reverse, as in the classification scheme proposed in Sec. IVII This 
allows the mathematical structure of the two types of problem to be compared. If, on the 
other hand, there is quite a bit of useful mathematical and physical intuition to be wrung 
from contemplating how quantum systems develop in time, the approaches mentioned in the 
previous paragraph may be worthwhile. Until the channel-entanglement duality has been 
more thoroughly explored, it is hard to say which approach is best. In any case, there are 
significant entanglement problems that map in a simple way onto channel problems, and a 
study of what entanglement does and does not mean in such cases might be very helpful. 

Ill Quantum Information 

III A Sample spaces and probabilities 

The basic concept of "information" used in the following discussion is that of a statistical 
correlation. This morning's newspaper contains information because the symbols are corre- 
lated in an appropriate way with yesterday's events. Information is contained in a photon 
traveling down an optical fiber because its properties are correlated with whatever produced 
it, and with the effects produced by the further processes it will undergo. An encrypted 
message contains information in that its symbols are correlated with those in the key used 
to encrypt or decrypt it. Shannon's information theory provides numerical measures for 
these statistical correlations, which apply to quantum as well as to classical systems (which, 
of course, are in fact quantum mechanical!), when probabilities have been properly defined. 

Standard probability theory |2H 1221 123] is based on the idea of a sample space of mutually 
exclusive properties. A quantum sample space or framework can be constructed using the 
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mutually exclusive properties associated with a decomposition of the identity (Sec. Ill AJ) of 
the Hilbert space used to describe the system. Given such a sample space one can assign 
probabilities using the standard formula 



where the quantum system is assumed to be described by a ket 1^) or density operator p 
functioning as a pre-probability, 1.6. , cLS £L device for generating probabilities [21] . Probabilities 
in quantum mechanics are often discussed in terms of measurements, which provide a good 
approach to understanding them in operational terms, even though it is rather unsatisfactory 
from a fundamental perspective (the infamous "measurement problem"; see, e.g., |25j). For 
present purposes such measurements should be thought of as ideal projective measurements 
which reveal the (microscopic) properties they are designed to measure; see the discussion in 
Chs. 17 and 18 of [23]. We shall have no need of more complicated concepts such as POVMs 
(see, e.g., p. 90 of [TH] . or Ch. 7 of [2E])- From time to time there have been proposals to 
introduce nonstandard notions of probability into quantum mechanics, but these have not 
proven very successful, and we shall not use them. 

In quantum mechanics, in contrast to classical physics, one is typically interested in a 
variety of sample spaces that are incompatible with each other, but whose probabilities can 
all be generated from a single pre-probability. For example, what is the probability that 
S x = +1/2, or that S z = —1/2, for a spin-half particle? The same ket or density operator 
may be used to answer these questions by inserting different projectors in (pTTj). but there 
is no way of combining the answers to make them refer to a single physical system, as it 
makes no sense to talk about S x = +1/2 AND S z = —1/2, or any other logical combination 
of propositions associated with incompatible decompositions of the identity whose projec- 
tors do not commute with each other. Traditional textbooks state that S x and S z cannot 
be simultaneously measured, which is correct. But the reason such joint measurements are 
impossible in a quantum world is that the combined properties do not exist: such a combi- 
nation is incompatible with the mathematical structure of the quantum Hilbert space, see 
Ch. 4 of [24]. Treating incompatible sample spaces as if they were compatible and combining 
the probabilities of one with the other is the same sort of mistake as ignoring the difference 
between xp and px when these symbols refer to quantum operators. 

Consequently, one must be careful when giving a physical interpretation to the various 
mathematical constraints, such as those in Sec. IIVI relating probabilities on different incom- 
patible sample spaces generated by a single pre-probability. They cannot refer to a single 
quantum system, as it cannot be simultaneously described by incompatible frameworks. In- 
stead, one must take a counterfactual approach: "This is what happens when a qubit initially 
in state |0) is sent through the channel, but if instead it had been in the state (|0) + |l))/2, 
then. . . ." To be sure, counterfactuals can themselves produce headaches in quantum theory 
if improperly used; for a consistent approach, see Ch. 19 of [21]. Alternatively, one can 
imagine different experiments carried out on an array of nominally identical systems. 

In comparison with classical physics, the new and unfamiliar element in quantum infor- 
mation theory is the multiplicity of incompatible sample spaces and probability distributions 
associated with them, even when one is using a single pre-probability. Finding good ways to 
think about this is a fundamental problem, perhaps the fundamental problem, of quantum 
information, and thus a major challenge to our understanding the world in quantum terms. 




(31) 
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Ill B Correlations 



Consider two systems S a and Sb, with Hilbert spaces H a and H b , and let {A 3 } and {B k } 
be decompositions of the respective identities I a and On the tensor product H a b = 'H a ®7ib 
used to describe the combined systems the projectors {A 3 B k } form a decomposition of I a b, 
and thus a sample space, to which probabilities may be assigned as in (p?T|) : 

Pi(A 3 , B k ) = (A j B k ) = Tr [(A 1 ® 5 fe )p] , (32) 
with p = for a pure state The marginal distributions 

Px(A j ) = ^Pr(^',5 fc ) = ( Aj ), 

k 

P T (B k ) = Pr(A j , B k ) = (B k ) (33) 

3 

are obtained by summing or by inserting A 3 ® I (i.e., A 3 ) or / ® 5 fc (i.e., _B fc ) on the right 
side of f)32jl . One can think of Pr(A J ', -B fc ) as the joint probability distribution of two random 
variables which take on integer values j and fc, and apply to it any standard measure of 
correlation including, if one wants, the Shannon mutual information I (A: B). Note, in 
particular, the condition for statistical independence: 

Pr(A j ,B k ) = Pr(A 3 ) Pr(B k ), or (A 3 B k ) = (A 3 )(B k ). (34) 

If one thinks of S a and Sb as physically separated systems, then the joint probability 
distribution (|3*2*|) will be the same as that of the outcomes of ideal measurements of {A 3 } and 
{B k } carried out on the separate systems. Consequently, the measurement outcomes will be 
correlated in precisely the same way as the quantum properties that have been measured, and 
one can use either the language of properties (our approach) or of measurement outcomes 
to discuss these statistical correlations. Discussions of measurements in textbooks often 
refer to "observables" rather than decompositions. Given a decomposition {A 7 }, one can 
always construct a corresponding observable O = Ylj a jA 3 with distinct (real) eigenvalues: 
aj 7^ cifc for j ^ k . But for our purposes these eigenvalues play no role, so the language of 
decompositions tends to be clearer than that referring to observables. 

Ill C Information present and absent 

Because of the multiplicity of incompatible quantum sample spaces, one needs to identify 
different types or varieties of information potentially available about a particular system. 
Given a decomposition A = {A 3 } of I a , we shall say that the A information about S a is 
present, or perfectly present, in another system Sb for a given pre-probability provided there 
exists a decomposition B = {B k } of h such that 

(A 3 B k ) = 5 jk (A 3 ) = 5 lk (B k ), (35) 

where one may have to renumber the projectors in one of the collections to satisfy this 
condition. A little thought will show that the first equality implies the second. The symmetry 
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of the definition implies that when some type of information about <S a is available in Sb, there 
is also some type of information about <S& available in <S a . Although we shall not make use 
of it in this paper, it is worth mentioning that the Shannon mutual information I(A: B) in 
this case is (— J2jPj ^°SPj) with pj = (A 7 ). 

If the A information about S a is present in Sb (in the sense just defined) for every 
decomposition of I a , we shall say that all the (quantum) information about «S a is in Sb- 
Clearly it suffices to check this for every orthonormal basis {|a J )}. Less obvious (theorem 0] 
in Sec. IIV|) is the fact that one need not check them all: two properly chosen incompatible 
bases suffice. We shall say that S a and Sb are informationally equivalent when all information 
about S a is in Sb and all information about Sb is in S a . 

The A = {A^} information about S a is (completely) absent from Sb provided any choice 
of a decomposition {B k } of lb is statistically independent, (J53)l . A little thought shows that 
this is equivalent to the requirement that 

Tr a (A j p) = (At)p b = p jPb (36) 

for every j, where pb = Tr a (p) is the reduced density operator for p&. (Note that it suffices 
to require that the operators defined by the left side of be proportional to one another; 
when that is so, summing them shows they are all proportional to p&.) In other words, for 
every j such that pj is not zero, the density operator conditional on A\ 

pi = Tr a (A j p)/pj, (37) 

is the same as p&. 

If for every decomposition A of I a — it suffices to check all orthonormal bases — the 
corresponding information about S a is absent from Si,, one can show (theorem ^ (iii) in 
Sec. EE)) that 

P = Pa® Pb, (38) 

from which it follows that all information of any sort about Sb is also absent from S a . In this 
case we shall say that S a and Sb are (completely) uncorrelated. No conceivable measurement 
on one of these systems will provide any information about the other. 

In the case of three or more systems, the presence or absence of particular types of 
information about S a satisfies some intuitively obvious rules. If A information about S a is 
present in Sb, it is also present in the combined system <S& and S c , denoted by «S& C . If it is 
absent from Sb c , it is absent from both Sb and S c . The same is true when "A information" 
is replaced by "all information." 

These definitions of information perfectly present or completely absent make no reference 
to any sort of numerical measure of correlation, and thus are useful for a qualitative rather 
than a quantitative discussion of quantum information. This is not to say that quantitative 
measures are unimportant — far from it — but they lie outside the scope of this paper. 
It is hoped that the qualitative approach developed here will help organize and motivate 
quantitative discussions, see Sec. IVH Bl 

III D Correlations for channels 

The preceding discussion referred to properties of separated systems S a and Sb at the 
same time. Basically the same ideas apply in the case of quantum channels, where S a is 
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the channel input at an earlier time and St, its output at a later time (Sec. Ill C|) . The 
only difference is the manner in which one calculates a joint probability distribution; (}3*2"j) is 
replaced by 

Pr(A j , B k ) = (A j B k ) = Tr [(A j <g> B k )Q] . (39) 

Here the transition operator Q, see (J2EJ). takes the place of the density operator in (JH2J- 
The marginals are once again given by ()33|) . The fact that Q is the partial transpose of a 
density operator R guarantees that the probabilities in (}3T?j) are well defined; indeed, they 
behave very much like those of a bipartite system described by R. 

One can once again visualize {B k } in terms of idealized measurements of what emerges 
from the channel, but the corresponding intuitive picture of {A 5 } is an ideal preparation. 
Of course, it is no more possible to prepare a quantum system in a state of two (or more) 
incompatible properties than it is to measure such a state, for such states do not exist in 
the quantum world. And just as an ideal measurement reveals a property possessed by a 
quantum system at a slightly earlier time, an ideal preparation results in a quantum system 
having a specific property at a slightly later time. The language of "preparation" and 
"measurement" is useful both for providing quantum concepts with intuitive content and for 
relating quantum theory to laboratory experiments, but it should be used to illuminate, not 
replace, the notion of statistical correlations among microscopic properties, whether at the 
same or at different times, as this is the more fundamental concept. 

The correlations obtained using a transition operator, (}39|) . are not entirely the same 
as those arising from a density operator, (f3*2"|) . but the differences are rather subtle. Given 
a pair of decompositions A and B, there is no way of telling whether the joint probability 
distribution comes from a density or a transition operator. What can happen with sets 
of correlations for incompatible decompositions, when they are generated by a single pre- 
probability, is best illustrated by means of an example. For a perfect one-qubit channel, 
p = in (J26j) . each component of angular momentum of a spin-half particle is identical at 
the entrance and at the exit, 

(«> = K°t) = («) = 1- (40) 

However, this type of correlation is impossible for two separate systems at the same time. 
What one can, instead, achieve by using an appropriate (pure state) density operator is 

K<) = -K<® = {<<%) = ! (41) 

or something similar: one of the terms (it need not be (cr^al)) must have a minus sign, or 
else there are three minus signs, as in the famous spin-singlet state used in discussions of the 
Einstein-Podolsky-Rosen paradox. Similarly, (}4~T|) is impossible for a quantum channel. 

Interesting as these differences, which arise from the partial transpose in (J23|) . may be, 
they are basically irrelevant to the concerns of this paper. The definitions of information 
perfectly present or completely absent given in Sec. lIII CI above and the theorems in Sec. II VI 
below apply equally to channels and entangled states. In both cases the fundamental issue is 
statistical correlations and what quantum theory has to say about them, and that is exactly 
the same once proper account is taken of the partial transpose. 
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IV All or Nothing Theorems 



It is convenient to organize a number of qualitative "all or nothing" results on the location 
of quantum information in a series of eight theorems. The first four refer to bipartite and 
the last four to tripartite systems. In several cases there are separate results depending upon 
whether the pre-probability is a pure state, indicated by a ket \^/), or a density operator p. 
The former are stronger than the latter, and the reader should keep in mind that any result 
that is valid for a density operator applies equally to the case of a pure state, even if that is 
not explicitly stated. 

While the theorems are stated for entangled states, thought of as different systems at a 
single instant of time, they apply equally to correlations at two different times in a quantum 
channel, for which is the channel ket. The bipartite systems used in the first four theo- 
rems are sometimes designated S a b and sometimes S ac . This makes the notation consistent 
with the later theorems for tripartite systems, where information about S a is present in Sb 
and/or absent from S c . Note that, in agreement with the definitions in Sec. IIII C| "present" 
means perfectly or completely present; "absent" means completely absent. The proofs will 
be found in App. A. 

The tripartite theorems have a no-cloning "smell" to them, and represent an attempt to 
give this important, but somewhat elusive, notion a precise information-theoretic content. 
The absence of theorems for p-part systems with p > 4 reflects our inability to find results 
of corresponding generality, and we hope our readers will be more successful. But keep in 
mind that a tripartite theorem might, for example, be usefully applied to S a bcd thought of as 
consisting of S a , Sb, and S c d — a strategy employed in discussing quantum codes in Sec. IV CI 

Theorem 1. Absence of information. 

i) If A = {A 1 } is a decomposition of I a , the A information about S a is absent from S c 
for a pre-probability \^/) G TL ac if and only if 

PA l P = a z P, (42) 

where P is the projector on the support of \I/ a , and the ai are (nonnegative) constants. The 
following is equivalent to (|4^|) : 

(p>\A l \p k ) = ai 6 jk , (43) 
where {Ip- 7 )} is a collection of orthonormal states which span the support of so that 

P = J2 j \p j ){p J \- 

ii) If A = {\a : ')} is an orthonormal basis and all A information about S a is absent from 
S c for |*) G H ac , then 

|#) = | a) <g> |t) (44) 

is a product state on 7i a ® TC C . 

iii) All information about S a is absent from S c for a pre-probability p G H, ac if and only 

if 

P = Pa® Pc, (45) 

which implies that all information about S c is absent from S a (the two are uncorrelated) . 
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Theorem 2. Presence of particular information. 

i) The A = {A 1 } information about S a is present in <S& for p G 7i a b if and only if 

A l A m = for I + m, (46) 

where 

A' = Tr a (A l p). (47) 

ii) The A = {A 1 } information about S a is present in S b for \ty) G if and only if 

[A l ,V a }=0 (48) 

for all I. In particular, if A — {la- 7 )} is an orthonormal basis, (}4*5j) is equivalent to the 
requirement that 

= J2\a j )®\[3 j ) (49) 

j 

be a Schmidt expansion, i.e., (/3*|/3 J ') = for j 7^ k. 

iii) If the ^4 = {^4'} information about S a is present in Sb for p G 7i a b, then for all / 

[A , ,p a ] = 0. (50) 



Note that if A is an orthonormal basis, ()48|) and (jBTIj) are equivalent to the assertion that 
the \l/ a or p a matrices are diagonal in this basis. 

Theorem 3. Presence of all information. 

i) All information about S a is in Sf> for |^) G 7i a b if and only if 

= la/da, (51) 

i.e., is maximally entangled. 

ii) All information about S a is in Sb for p G 7i a b if and only if there are Hilbert spaces 
T-Ld an d H. e whose tensor product is Tib or a subspace of Hb, and p is of the form 

P = <j) ® p e eH ad ® H e , (52) 

where = projects on a fully-entangled state |0) G TC a d- This last implies (but is not 

implied by) 

Pa = la/ d a - (53) 

hi) All information about S a is in <S& and all information about Sb is in S a , i.e., the two 
systems are informationally equivalent, if and only if the pre-probability is a fully-entangled 
pure state, i.e., maximally entangled with 7i a and Hb of the same dimension. 

The utility of theorem El increases significantly through the existence of some (seemingly) 
rather weak conditions which imply that all information about S a is in S b - To this end we 
need the following definition. Two decompositions A = {A^} and A = {A k } of I a are strongly 
incompatible if there exists no projector P, apart from P = and P = I a , that commutes 
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with all the {A 7 } and all the {^4 fc }. This is, for example, the case when A = {(a- 7 )} and 
A = {|a J ')} are two orthonormal bases for which 

(a j \a k )^0 (54) 

for all j and k, a condition which is fulfilled when the two bases are mutually unbiased, 
(J3J), but is obviously much weaker. Strong incompatibility is weaker still; it is possible for 
a number of the inner products in (jB^j) to vanish provided a sufficient number are nonzero. 
Indeed, two decompositions can be strongly incompatible without all of the projectors, or, 
in some cases, any of the projectors being onto pure states. We shall not pursue the matter 
further at this point, but instead state the desired result: 

Theorem 4. Strong incompatibility. Let A and A be two strongly incompatible decomposi- 
tions of I a , according to the preceding definition, and suppose that both the A and the A 
information about S a is in Sb- Then 

pa = la/ d a , (55) 

and if, in addition, p — ^ is a pure state on Ti a b, then all information about S a is in Sb- 

The following theorems refer to a tripartite system S a b c - 

Theorem 5. All information absent. If for e Ti a b c all information about S a is absent 
from S c , there are Hilbert spaces Tid and Ti e whose tensor product Tide is either Tib or a 
subspace of Tib, and is of the form 

|*) = |X>®|V> eTi ad ®Ti ce . (56) 



Only if the support of is a proper subspace of Tib will Tide differ from Tib, and in that 
case it can be identified with the subspace. The "hidden product" structure of (|B^j) turns 
out to be a surprisingly useful tool. 

Theorem 6. Particular information present for a pure state. For a pre-probability 6 Ti a b c : 

i) If A = is an orthonormal basis of Ti a , a necessary and sufficient condition for 
the A information to be present in Sb is that 

j 

where the {P} are (positive) operators on Ti c . 

ii) If for some decomposition A = {A k } of I a , 

m ac = Ys Ak ® Tk i ( 58 ) 

k 

the A information about S a is in Sb, and if A = {A 1 } is a compatible decomposition of I a 
in the sense that all the {^4'} projectors commute with all the {A k } projectors, then the A 
information is also present in Sb- (In particular, A may be an orthonormal basis in which 
the {A k } are diagonal.) 
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Theorem 7. Particular information present for a mixed state. Suppose that the A = {la- 7 )} 
information about S a is in Sb for p G 7i, a bc- Then 

i) The reduced density operator on TC ac is of the form 

Pa C = X> J >H® r '> ( 59 ) 
3 

where the {P} are (positive) operators on 7i c . 

ii) If A = {\a k )} is another orthonormal basis of TC a , and A and A are mutually unbiased, 
then no A information is in S c , and 

Tr(p[a fc ]) = 1/4, (60) 

independent of k. 

Theorem 8. No splitting theorem. 

i) If for p G Ti-abc all the information about S a is in 5;,, then there is no information about 
S a in S c , 

Pac = Pa® Pc- (61) 

ii) If for I 1 !') G 7i a b c all the information about S a is in Sb c , and none of it is in S c , then 
it is all in Sh- 
in) If for p G Ti-abc all the information about S a is in «S& C , but none of it is in S c , then the 

dimension of Tib is not less than that of 7i a - 

Note that (hi) in this last theorem is a weaker result than (ii), for if all the S a information 
is in Sb, then by theorem El (ii) the dimension of Tib cannot be less than that of 7i a . The 
difference between (ii) and (iii) turns out be of some interest for understanding quantum 
codes, Sec. IV CI 



V Applications 

V A Channels with mixed-state environment 

There is no loss in generality in assuming the environment for a quantum channel is 
initially in a pure state, Fig. El provided the dimension d e of 7i e is at least d 2 a . The question 
has been raised [2ZII2H1 as to what channels can be produced using a smaller d e , e.g., d e = d a , 
if one assumes an initial mixed state for the environment. 

Such a channel can be modeled in the manner indicated in Fig. 01 with a "large" envi- 
ronment S e d initially in a pure state \x), which when traced down to TC e yields the desired 
mixed-state density operator. The unitary transformation T maps 7i ve onto Hbc to produce 
the analog of Fig. where / has become the pair cd, and |0) is again the fully-entangled 
state (JTTJ). The channel ket 

|*) = (/a®T®/ d )(|0)®|x) (62) 

is a pure state of TL a bcd- 
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Figure 4: Channel and channel ket |^) for a mixed-state environment. 



This channel ket has the interesting property 



^ad = ® 



(63) 



which means that «S a and 5^ are uncorrelated; no information about one is available in the 
other. It follows from the fact that the product state on the right side of (Jo^j) has this 
property, which is preserved during time development because the unitary operator T does 
not act on 7i a d- As a consequence, ^/ a d (and therefore also its partial traces \l/ a and ^d) is 
independent of time. Note that this invariance is not true (in general) if T is not a unitary 
operator. The reason, in physical terms, is that a general map from 7i ve to 7i& c can be 
thought of as involving post selection, based upon some sort of joint measurement. Since S a 
is correlated with S z and Sd with S e through the entangled initial states, the final state of 
affairs conditioned on the outcome of such a measurement may very well contain correlations 
between S a and Sd- 

Not only is (|63j) a consequence of our model of a mixed-state environment, it comes 
close to being the very essence of the matter in light of theorem |5] applied to the tripartite 
Ti-a ® 7~(-bc ® 7~(-d, for that tells us that necessarily involves a "hidden product" structure. 
What is required to bring that structure to light is a suitable unitary transformation, which 
is T in Fig. HJ To be sure, theorem El does not tell us that \<p) shall be fully entangled - 
which suggests that the problem of a channel with a mixed-state environment is actually 
part of a more general information-theoretical question about entangled states on 4-part 
systems, and exploring it from this perspective may be useful. In addition, our analysis 
suggests a close connection between such channels and properties of unitary transformations 
on bipartite systems. 



The notion of a CQ or "classical-quantum" channel was introduced in and has been 
the subject of some recent studies |H] in connection with entanglement-breaking channels, 
which were introduced in |PjJ. An entanglement-breaking channel may be defined as one 
in which the dynamical operator R in (J21|) is separable, in the standard way in which that 
term is applied to density operators (see, e.g., jSHE2I, Sec. 2.2.3 of pQ), and a CQ channel 
is a particular case of an entanglement-breaking channel in which R has the form 



V B CQ channels 




(64) 



j 
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using a suitably chosen orthonormal basis A = {|a J )} for H, a , and positive operators B-i 
of unit trace (to ensure ([15))) on Ti^. The remarks which follow apply equally to a QC or 
"quantum-classical" channel, with the roles of a and b interchanged. 

Introducing the channel ket G H a bf with R = ^ a b, (JHJ), allows one to apply theorem^] 
(i) in order to characterize a CQ channel as one in which there is an orthonormal basis for the 
channel entrance such that the information associated with this basis is perfectly present in 
the environment Sf at the later time. Note that such a characterization is not immediately 
obvious from considering the dynamical operator, or, equivalently, the channel superoperator, 
for these are obtained by tracing out, thus ignoring, the environment, whereas the property 
which provides the simplest characterization in information-theoretic terms has very much 
to do with what information is available in the environment! 

Using a channel ket in no way reduces the value of the insights provided in the studies 
cited above, nor does it supply (at least in any obvious sense) alternative tools for arriving at 
the technical results in those papers. But it does suggest a genuinely quantum-mechanical 
and information-theoretical description of what is "classical" (the C in CQ ) about a CQ 
channel: namely, the environment provides perfect decoherence in a particular basis, as a 
consequence of which no information in any "complementary", which is to say mutually 
unbiased basis, is available at the channel exit, theorem [7| (ii). This is typical of what is 
generally referred to as "classical communication." 

V C Information location in quantum codes 

Quantum codes allow quantum information to be preserved against the effects of noise, 
whether due to interaction with the environment in a quantum communication setting, or 
imperfect gates in a quantum computer, and thus they have received a great deal of attention; 
for an introduction, see |3Sj and Ch. 10 of (THj. Our purpose here is not to contribute to 
the technical literature, but instead to point out how the basic operation of such a code can 
be understood in terms of the presence or absence of certain types of information in certain 
places. 

The standard scenario is one in which the quantum information is embedded in a code 
B, a i^-dimensional subspace of the Hilbert space 

n d = n x ®n 2 ®---n n (65) 

associated with n carriers of the coded information. The simplest situation is one in which 
K = 2 = d m for 1 < m < n, but most of what we have to say applies more generally. Define 
the security s of the code to be the largest integer such that the encoded information is 
entirely absent from any set of s or fewer carriers (in a sense made precise in below). 
That is, an eavesdropper could learn nothing at all by carrying out arbitrary measurements 
on a set of s carriers, but could learn something from a suitable set of s + 1 carriers. In the 
literature it is customary to refer to s + 1 as the "distance" d of the code, using an analogy 
with classical codes in which d is the minimum Hamming distance between two code words. 
For a quantum code the notion of "distance" is somewhat obscure, as is the notion of code 
word, whereas s has a simple intuitive interpretation. 

For analyzing the security and the error- correct ion properties of the code it is convenient 
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to define a channel ket 

VK = K> ® e W« ® Ha, (66) 
i 

where the {(a- 7 )} form an orthonormal basis of the channel entrance 7i a , with d a = K, and 
the {(fr 7 )} an orthonormal basis of the code subspace B with projector 

B = J2\V){V\. (67) 

3 

Thus the encoding operation maps Tt a onto B. One can visualize | using Fig. El but with 
Sb and Sf combined to form S d - 

The security condition introduced earlier can now be stated as 

Van = *a ® (68) 

where w denotes any subset of s integers drawn from {1,2, . . .n}. Note that if (}68|) holds 
for such a set, it also holds for a smaller set; simply take an appropriate partial trace of 
both sides. In view of theorem ^ (iii), ()68)1 expresses precisely what we want to say by the 
security condition: if it is satisfied, no conceivable measurement on S u will reveal anything 
about any sort of information in the channel entrance, whereas if it is not satisfied, some 
sort of information will be at least partially available to an eavesdropper. 

From the definition (fMJ) it is obvious that \l/ a = I a /d a , so by theoremOl(i) all information 
about S a is in Sd- Thus by theorem [S] (ii), if none of this information is in S u , it must be 
in the complement of this system in Sd- That is, all the information about S a is available 
in any collection of n — s carriers; given any such a set, there will be a means of extracting 
or recovering the information from it even if the other carriers are ignored. This provides a 
preliminary understanding in information-theoretic terms of how a quantum error-correcting 
code functions, though some additional points remain to be dealt with. 

In order to relate the security of the code to the discussion of error correction found in 
|34[ , it is helpful to introduce the following definition. An operator F on 7i d will be 
said to have a base S w , where w is some subset of, and w its complement in, {1,2, . . . n}, 
provided 

F = F W ®1 & eH w ®H a , (69) 

and w is the smallest set for which F can be written in this form. The size of the base of F 
is the number of carriers in S w , the number of integers in w. 

A code has security s when for every operator F with a base whose size does not exceed 
s it is the case that 

BFB = b(F)B, or (b j \F\b k ) = b(F)5 jk , (70) 

and s is the largest integer for which this is the case. Here b(F) is a (complex) number that 
depends upon F, but not on j or k, and B is the projector in (|67|) . The two equalities in 
f!70|) are equivalent because the second is simply the first expressed as a matrix when one 
extends {I& 7 )} to an orthonormal basis of Tid- To see that (fTOj) is correct, first apply it in the 
case where F is a projector in 7i u for some u for which (|68|) holds, and use theorem^ (i), 
with H a in the theorem replaced by TCd, and TC C by TC a , to the decomposition {F, Id — F} of 
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Id- Any operator on 7i u can be expressed as a linear combination of projectors, and hence 
by linearity, and the "if and only if" of theorem^ (i), we arrive at the equivalence of (|UH|) 
and (fTOj) as statements that 7i a and 7i u are uncorrected. 

Now (fT0|) is very similar to the necessary and sufficient condition 

(V\K]K m \b k ) =b lm 5 jk (71) 

of [33] (in a slightly different notation) for a code to be able to correct a class of errors 
corresponding to the Kraus operators {K{\ acting on the space Tid- If these Kraus operators 
have a base no larger than t, then F = K\K m has a base that is no larger than 2t, and we 
arrive at the condition 

s = 2t (72) 

relating the security s to the maximum number of errors t which can be corrected. That is, 
a code which allows full recovery of information when t carriers are tampered with in any 
way, and one does not know which carriers have been affected, must allow full recovery when 
any known set of s = 2t carriers have been tampered with; in the latter case the information 
will be recovered from the n — 2t remaining carriers. Thus the well-known five qubit code 
- see |34[ and p. 469 of ^H] ~ allows error recovery in the case of tampering with any 
one of the five carriers, but also if any two are stolen, since recovery is then carried out on 
the three that remain. (For a helpful discussion of this somewhat confusing point, see |37j.) 

The foregoing considerations make it possible to understand in information-theoretical 
terms the quantum Singleton lower bound 

n > At + logK/logD (73) 

on the number of carriers, each assumed to have a Hilbert space of dimension D, in a 
quantum code [38]; also see p. 568 of jTH]. One argues as follows. In order to correct up to t 
errors on unknown carriers the code must have a security of s = 2t: there is no information 
about S a in any collection of 2t carriers, so by theorem [S] (ii) all the information about S a is 
in any set of n — 2t carriers, as we noted earlier. But in a set of n — 2t carriers, no information 
can be present in a subset of 2t carriers, and thus by theorem |H1 (iii), the Hilbert space of 
n — 2t — 2t = n — At carriers must have a dimension greater than or equal to d a = K. This 
last assertion is equivalent to (J73J). 

Note how in carrying out this argument it is essential to distinguish between a pure 
state pre-probability \^>) and a mixed state pre-probability p. The former is needed when 
using theorem [H] (ii) to infer the presence of all the information about S a in any collection of 
n — 2t carriers, given that it is absent from any collection of size 2t. However, these n — 2t 
carriers along with S a form a system whose pre-probability is a density operator, and as a 
consequence we cannot use the fact that no 2t of these carriers contain information to infer 
that it must be present in a set of n — At carriers, something that is (at least in general) 
not true. By using theorem |H1 (iii) instead of theorem |H1 (ii), we correctly infer that the 
leftover collection of n — At carriers has a certain minimal size, not that it contains all the 
information! 

The foregoing discussion focussed on codes for which arbitrary errors in t or fewer carriers 
can be corrected. What of codes designed for the correction of errors of a more specific sort? 
Once again (f7Tj) applies, but only to a more specialized class of operators. Consider, for 
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example, the three-qubit code which is adequate for bit-flip errors, or p. 430 of [T^j . 
Such errors can be represented by a Pauli a x on a single qubit, and what (j71|) is telling us is 
that no X information about any pair of qubit code carriers can be present in S a , where the 
sample space X is the orthonormal basis {l^f^)} if the carriers are 1 and 2; here \x ± ) are 
the eigenstates of a x . 

The statement about absence at the channel input S a of certain types of information 
about some of the carriers can be misinterpreted if thought of in terms of some backwards- 
in-time "influence" which the carriers exert on the channel input. Instead, keep in mind that 
the real issues have to do with statistical correlations between states-of-affairs at different 
times as represented in appropriate sample spaces or frameworks. Error recovery depends, 
of course, on information being present in appropriate locations, and quantum no-cloning 
(loosely speaking) allows us to connect the presence of information in one place with its 
absence someplace else. Presence and absence should always be thought of in terms of 
statistical correlations. 

VI Classification of Channel and Entanglement Prob- 
lems 

The fact that the properties of a quantum channel can be deduced from those of a 
channel ket, and likewise the properties of an entangled mixed state from those of a suitable 
purification, suggest the possibility of classifying these two types of quantum information 
problem in a single scheme based on pure states of a p-part system. Of course, for each p one 
should then introduce additional categories with some information-theoretical significance. 
The dimensions of the p subsystems are meaningful parameters, and other features, such as 
the "all" or "nothing" character of certain types of information, could assist in classifying 
particular cases. The motivation behind such a classification scheme is to have a useful 
way of comparing different types of experimental phenomena or theoretical models, one that 
may suggest analogies in instances where these are not immediately evident. Seeing how 
it relates to other problems does not, of course, automatically provide a solution or even 
a better way of thinking about a particular question, but could in some cases suggest an 
alternative approach, or allow the application of a different set of ideas. 

There are two reasons for preferring a classification using entangled states to one based on 
channels. First, every channel problem (of the sort under discussion) maps in a simple and 
natural way to an entanglement problem, while the reverse is subject to some qualifications, 
as discussed in Sec. Ill El Second, entangled states have a higher "conceptual symmetry" ; for 
example, it is more natural to ask what happens if two subsystems of a bipartite system are 
interchanged than what will occur if the channel is, so-to-speak, operated in a time-reverse 
mode. The utility of pure states as against mixed states is less obvious, but the results in 
Sec. |IV] suggest that this may lead to a simpler classification using the location of quantum 
information, assuming that is a useful way to proceed. 

Now let us consider some preliminary results. The Schmidt expansion for bipartite pure 
states provides a complete classification, up to local unitaries, for p = 2, and the by now 
standard pure-state entanglement measure has proven itself a remarkably useful tool for 
their study. Noiseless quantum channels described by unitary time development fall in this 
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category, and correspond to fully-entangled states. 

The difficult problems start with p = 3, which includes both mixed-state entanglement 
and the standard model for noisy quantum channels. Classifying the two together imme- 
diately raises the question of how various mixed-state entanglement measures, [H EH] , may 
be related to the many different types of quantum channel capacity that have been defined 
[HUH]. There is a brief discussion in Sec. 6.3.3 of [T], which notes that the equivalence of 
the (simple) quantum capacity and a one-way distillation entanglement measure was demon- 
strated in [23]. But we know of no systematic attempt to relate objects which ought to have 
a close connection. Or, if they do not have a close connection, why is that? 

If one further classifies p = 3 problems according to the sizes of the subsystems, the 
obvious starting point is pure states of three qubits. Some one-qubit noisy channel problems 
fall in this category, as does the simplest cloning problem Leaving aside cases of a 

product state of one qubit with an entangled state of the other two, which in some sense 
belong to the p = 2 class, the remaining states fall into two classes, "W" and "GHZ," under 
the equivalence generated by 

|*') = (A®B®C)\V), (74) 

where A, B, and C are nonsingular operators 02] • This is a very interesting result which 
does not seem to have been generalized to larger subsystems. However, even for qubits 
it may not represent a complete classification scheme, for operations of the form (|74|) do 
not, in general, preserve all the properties that are of interest from an information-theoretic 
perspective (in which functions as a pre-probability) . 

The general one-qubit noisy quantum channel falls in the p = 3 category, with two 
subsystems (entrance and exit of the channel) of dimension 2, and one (the environment) 
of dimension 4. A quite general description of such channels has been worked out in |4*3*] . 
and this work can and should be regarded as a significant step in classifying a large and 
important set of tripartite pure states. There are, on the other hand, entangled states which 
escape this classification (for the reasons explained in Sec. Ill E| , and it would be interesting 
if the methods used in could be extended to these as well. 

A unitary transformation mapping a bipartite system to itself can be thought of as a 
p = 4 problem, equivalent to a fully-entangled state between two bipartite systems. In 
the case of two qubits such unitaries can be written down explicitly in terms of three real 
parameters 02] , up to local unitaries on the individual qubits, and this provides a convenient 
description of an important class of p = 4 pure states in which each subsystem has dimension 
two. Beyond this very little seems to be known at present about the four qubit problem. 
A one-qubit channel with a mixed-state environment falls in this category, as explained in 
Sec. IV Al The entanglement of purification introduced in [To] is an example of a p = 4 
problem not limited to qubits, as is the general problem of a channel corresponding to a 
mixed-state environment. 

As noted in Sec. IV CI a quantum code with n carriers falls in the p = n + 1 category 
of states for which there is an absence of correlations between one particular subsystem 
(the channel entrance) and various collections of other subsystems. Relating quantum codes 
to more general problems of multipartite entanglement is an interesting and challenging 
problem |20j . 
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VII Conclusion 



VII A Summary 

The fundamental idea underlying the duality discussed in Sec. [TT| is that the correlation 
of events at different times that characterize a quantum channel are "the same thing" as the 
correlation of properties of an entangled quantum system at different points in space. At 
the mathematical level the correspondence is expressed by a simple partial transpose (|2H|) 
that carries the dynamical density operator R, into the transition operator Q representing 
the channel superoperator. In physical terms the duality says that the correlations which 
express the location of information about one quantum system in another are of basically 
the same nature, whether they refer to properties of a single system at two different times, 
or to two different systems at the same time. This is well-established in classical information 
theory, where the same tools are used for both circumstances, and it works equally well in 
quantum systems given appropriate sample spaces or frameworks, as explained in Sec. IIHI 

The nonclassical "peculiarities" of quantum information emerge when one uses a single 
pre-probability, either a pure state or a density operator, or their counterparts for a quantum 
channel, to generate probability distributions and thus correlations for a variety of different, 
incompatible frameworks (sample spaces). It is here that "no-cloning" plays a central role, 
and the eight all-or-nothing theorems of Sec. II VI are intended to make that idea more precise 
and more widely applicable. While the theorems are expressed in entanglement language, 
the duality allows their immediate application to quantum channels. In many cases the 
results are more precise (and in others their derivation is easier) when the pre-probability is 
a pure rather than a mixed state, which in the case of a quantum channel means a channel 
ket rather than a dynamical operator. This suggests that channel kets are a useful tool for 
analyzing the properties of noisy quantum channels, and the applications in Sec. IS bear 
this out. Whether pure states are equally advantageous for classifying entangled states and 
quantum channels in a single scheme remains to be demonstrated, but the preliminary results 
in Sec. IVII are encouraging. 

VII B Open questions 

The eight all-or-nothing theorems of Sec. II VI provide a useful first step in describing in a 
systematic way how information can be divided up or spread out over an entangled quantum 
system. But one suspects there remains much more to be said, both about bipartite and 
tripartite systems, and also about systems with p > 4 parts. In addition, every qualitative 
theorem of the type found in Sec. II VI ought to be the limiting case of one or perhaps several 
quantitative theorems in which the complete presence or absence of information is replaced by 
quantitative measures — Shannon entropies are an obvious, but not the unique possibility - 
and constraints are provided in the form or rigorous inequalities, or perhaps even equalities, 
if one is lucky. While some ideas of this sort have been put forward, e.g., jl^l EZI, a great 
deal more could be done. 

To be sure, several entanglement measures have been proposed for bipartite mixed 
states [TJ and to a lesser extent for systems with p > 3 parts; see j20] and the ref- 
erences given there. But rarely do these have a specific information-theoretical content or 



24 



basis, and it is an open question whether, and if so how, they can be understood in such 
terms, i.e., related to statistical correlations forming part of a consistent probabilistic de- 
scription of a quantum system. To be sure, entanglement measures can be useful even if they 
have no connection to information theory, but if there is such a connection, understanding 
what it is could be a useful contribution to the subject. 

Discussions of quantum channel capacities seem better anchored in an information- 
theoretic framework than those concerning entanglement measures, though perhaps more 
thought should be given as to how to translate "classical," which occurs rather frequently 
in such discussions, into appropriate quantum mechanical terms; we no longer live in a clas- 
sical world! Relating these capacities to entanglement measures seems at present a largely 
open question, and answering it could make a valuable contribution understanding both 
entanglement and noisy channels. 

The task of classifying entangled pure states of p-part systems in the manner suggested 
in Sec. |Vl]can be regarded as complete for p — 2, but for p = 3 it has just begun, and very 
little is known about p > 4 systems apart from work on quantum codes. Extending the latter 
to more general entangled states could make a significant contribution to our understanding 
of multipartite entanglement, which at present is quite limited. 
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A Appendix. Proofs of theorems in Sec. IIVI 

Theorem H](i). Expand \^f) in Schmidt form, 



and let J be the collection of j values for which qj > 0. For the {^4'} information to be 
absent from S c , it must be the case, see (jSHJ), that 





3 




(A.2) 



is proportional to 




(A.3) 



which means that 



(a j \A l \a k ) = ai S jk 

for all j and k in J. This is the same as (143)) . which is the same as (|42j) . 



(A.4) 
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Theorem n (ii) • Expand \ty) in the orthonormal basis (see (JBJ)): 

|$) = K> ® W). (A.5) 

3 

The requirement that no information about {la- 7 )} be in S c means that all the |7 J ) must be 
proportional to each other, and thus to a single ket I7), which means that |\?) is of the form 

dm. 

Theorem^ (iii). The "if" part is obvious. To prove that (J43|) holds if all information 
about S a is absent from S c , let {|a J )} and {\c 1 )} be bases in which p a and p c are diagonal, 

3 I 

and write 

p = ^^(a j c l \p\a k c m )(\a j )(a k \ ® \c l )(c m \^j (A.7) 

jk Im 

The absence of all information implies that 

Tr a (Ap) = (A)p c (A.8) 

for any operator A G Ti, a — see (|36p. and note that the collection of all projectors is an 
operator basis for Ti a . Insert A = \a k )(a^\ in (|A.8[) . and use ()A.7|) to evaluate the left side 
and ()A.6|) the right. The conclusion is that 

(a j c l \p\a k c m ) =p j qi5 jk 5i m , (A.9) 

which is ([45)1 . 

Theorem |21 (i). If the A information is present in «S&, (|35J) implies that 

Tr(^5 m ) = Tr fe (A^ m ) = o, m Tr b (A'), (A.10) 

since (A 1 ) = Tr^(A z ). If P and Q are positive operators such that Tt(PQ) = 0, then PQ = 0. 
Using this and the fact that the B m are projectors, so that A' = A l B m + A 1 (lb — B m ), one 
sees that (|A.10|) implies that 

B m A l B m = 5 lm A l , (A. 11) 

and (|46j) is a consequence of B l B m = 5i m B l . Conversely, (|46|) implies that one can simulta- 
neously diagonalize the collection {A*} and choose the B l projecting onto appropriate blocks 
in such a way that (jA.ll|) . and therefore (|A.10|) and (|35|) are satisfied. 

Theorem |21 (ii) . Choose an orthonormal basis in which the {A 1 } are diagonal, and 

expand in this basis, (|49jh without assuming it is in Schmidt form. Then 

^a = Y,^ k \P) \ a3 )( ak \ (A.12) 
jk 

and 

A< = ^y></n (a. 13) 
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where Ji is the collection of j values for which A l \a 3 ) = \a 3 ). One can show that \I/ a commutes 
with all the A 1 if and only if (f3 k \f3 ] ) = whenever j G J\ and k G J m with m/L But this 
last is equivalent to pBjl. If the A' project onto one-dimensional states, then {(3 k \(3 3 ) = for 
j 7^ so (jlS)) is in Schmidt form. 

Theorem El (iii). Purify p to a ket G 7^ a & c . Use the fact that the A information is 
present in Sb c , and apply part (ii) of the theorem with Sb c in place of Sb to infer that \l/ a = p a 
commutes with all the A 1 . 

Theorem |3J Part (i) is an immediate consequence of 2 (ii) , for it is only multiples of 
the identity that commute with all projectors. The proofs of (ii) and (iii) are given below, 
following that of theorem |H1 

Theorem EJ By theorem p a or \l/ a must commute with all the {A J } and all the {^4 fc }, 
and must therefore, by the definition of strong incompatibility, be multiples of I a . The final 
statement is a consequence of theorem |3] 

TheoremEl Let {|a J )} and {|c fc )} be orthonormal bases of 7i a and 7i c which diagonalize 
^ a and 

j k 

and expand \^) in these bases: 

= J2\a J )® \(3 jk )® \c k ). (A.15) 

jk 

The condition = a <& c expressing the absence of all S a information from Sb, theorem Q 
(iii), implies that 

((3 fk '\^ k ) =p j q k 5 jj ,5 kk ,- (A.16) 

Therefore if we restrict our attention to the j G J and k G K for which pj > and q^ > 0, 
we can construct an orthonormal set 

\v k ) = \P 3k )/Vpm ( A -!7) 

of kets in Tib, and rewrite (|A.15|) in the form 

l*> = J2VWk~\ aj ) ® \b jk ) ® \c k ). (A.18) 

The spaces Hd and H e are then defined as having orthonormal bases {|<i J ')} and {|e fc )} such 
that 

\V k ) = \d j ) ® \e k ), (A. 19) 

so that |\&) is of the form 1)56)1 with 

\x) = Y,^W)®\d ] ), \^) = J2V^\c k )®\e k ). (A.20) 

j k 

Theorem El (i). Expand |^) in the orthonormal basis {(a- 7 )} 

\y) = J2\a j )®\( j ), (A.21) 

j 
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with G Ti.bc, and write 

v& = ^|^)(a fc |®C Jfc ; ( jk ■■={?)((% (A.22) 

If the {|a J )} information is in Sb, then by theorem El (i) 

Cf C = for j + k, (A.23) 

where, following our usual notation, = Tr c (£-"). Now apply ()B.3|) in App. El with a 
replaced by b, b replaced by c, |e) = |£ J ') and \g) = \( k ), to conclude that ()A.23|) holds if and 
only if 

Cf = for j ^ k. (A. 24) 

(Note that Tr c (CC^) = implies that (7 = 0.) But (fA~24jl i nserte d in (fA~22jl implies (|57jl 
with T- 5 = Conversely, (|57jl implies (|A.24|I . which implies ()A.23j) . which, using theorem 121 
(i), implies that the {|a J ')} information is in Sb- 

Theorem |H1 (ii) . Let {(a- 7 )} be any basis in which the A k in (|58|) are diagonal. Then (|57|) is 
a consequence of ()58j) : simply write each A fc as a sum of a suitable collection of [a- 7 ]. Thus by 
(i), the {|a J )} and, a fortiori the {A h } information is in Sb- For a compatible decomposition 
A = {A 1 }, use a basis {|a J )} in which both these and the {A k } are diagonal. 

Theorem (i). Purify p to |^) G Ti a bcd, and apply theorem |H1 (i) with c replaced by cd to 
conclude that \l/ aC( i is of the form (|57j) with operators P on 7i cd . Now trace both sides over 
Tid to get the equivalent of (JoTJj) . 

Theorem (ii). Multiply both sides of (|59|) by [a fc ]. First trace over W a and use the 
definition of mutually unbiased bases in (J3J) to conclude that the resulting operator (on Ti c ) 
does not depend on k, so the {|a fc )} information is absent from S c according to the definition 
in Sec. IIII C\ see the comment following (f3H|). Next, trace over Ti c to get (|50|) . 

Theorem |S] (i). Given an arbitrary orthonormal basis *4 of Ti a , one can always find 
another basis A with A and .4. mutually unbiased. As the A information is, by assumption, 
in Sb, the A information cannot be in S c , by theorem (ii). 

Theorem |H1 (ii) . All the information about S a is in Sb c , so \l/ a = I a /d a by theorem El (i). 
But as there is no information about S a in <S C , theorem El tells us |^) is of the form (|56|) . 
with Xa = ^a = la/ da, and therefore, once again invoking theorem El (i), all the information 
about S a is in Sb- 

Theorem |H] (iii). (The following argument is from p. 569 of ^H], where it is ascribed to 
|4*5] , and it makes use of some well-known properties of the von Neumann entropy 

S(p) = -Tr(plogp); (A.25) 

see, e.g., pp. 513 and 515 of [ISj.) Upon purifying p to \^) G Ti a b c d one finds that 

S(* a ) + S(* c ) = S(* ac ) = S(% d ) < S{%) + (A.26) 

The first equality is a consequence of the absence of information about S a in S c , thus ^> ac = 
^ a ® ^ c by theorem [U (iii). The second equality reflects the fact that is a pure state on 
Tiac <8> Tibd, and the final inequality is a standard result for a density operator on a tensor 
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product. Since all information about S a is in <S& C , it must be absent from Sd by part (i) of 
this theorem, so we can interchange the roles of S c and Sd in (jA.26|) to obtain 



S(* a ) + S(* d ) < S(%) + S(%), (A.27) 

and by adding this to (jA.26|) arrive at 

S(* a ) < S(%). (A.28) 

By theorem 01 (i) (replace b by bed) we know that \l/ a = I a /d a , so the left side of (jA.28|) is 
logd a , and as the right side cannot exceed log db, therefore (4 > d a . 

Theorem El (ii) and (iii). Purify p to \^>) G Tiabc- If all information about S a is in Sb (for p 
and for then by theorem |H1 (i) there is none in <S C , so by theorem 1^) has the product 
structure of (|5fi|) . where in addition \x) must be maximally (fully) entangled, so we arrive at 
If, on the other hand, is correct, then (fi a = I a /d a , and all the information about 
S a is in Sd, and therefore in Sb- To prove theorem El (iii), note that if \ty) is given by 
and d e is 2 or more, I e has a nontrivial decomposition, and the corresponding information 
obviously cannot be in S a . Thus if all the information about Sb is in S a , it is the case that 
d e — 1 and Tide is the same as TCd, and the latter is the same as Hb, for were it a proper 
subspace, ^b would not be proportional to h- 

B Appendix. Four entangled kets 

Let 

D e ' = |e></|, D e J = Ti b (D e f), D? = Tr a (D*f) (B.l) 

denote the dyad and its partial traces for two kets |e) and |/) on Ti ah = Ti a ® 7~tb- 
Theorem. Let |e), |/), \g), \h) be any four kets on 7i a b- Then 

Tr a (D £ jDt) = Tr b (Dt h D 9 b f ). (B.2) 

In particular, if |/) = |e) and \h) = \g), then 

Tr a {DfDf) = Tr 6 p 6 es Df). (B.3) 

Proof. Let {|a-')} be a fixed orthonormal basis of Ti, a , and expand each ket in the form 

\w) = W) ® \ wj )- ( B - 4 ) 

3 

Direct calculation shows that the left and right sides of (jB.2J) are both equal to 

53</V)<ZiV). (B.5) 

jk 
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